Suffix Tree
نویسندگان
چکیده
SYNONYMS Compact suffix trie DEFINITION The suffix tree S(y) of a non-empty string y of length n is a compact trie representing all the suffixes of the string. The suffix tree of y is defined by the following properties: All branches of S(y) are labeled by all suffixes of y. • • Edges of S(y) are labeled by strings. • Internal nodes of S(y) have at least two children. • Edges outgoing an internal node are labeled by segments starting with different letters. • The segments are represented by their starting position on y and their lengths. Moreover, it is assumed that y ends with a symbol occurring nowhere else in it (the space sign is used in the examples of the present entry). This avoids marking nodes, and implies that S(y) has exactly n leaves (number of non-empty suffixes). All the properties then imply that the total size of S(y) is O(n), which makes it possible to design a linear-time construction of the suffix tree. HISTORICAL BACKGROUND The first linear time algorithm for building a suffix tree in from Weiner [15] but it requires quadratic space: O(n × σ) where σ is the size of the alphabet. The first linear time and space algorithm for building a suffix tree is from McCreight [13]. It works " off-line " ie it inserts the suffixes from the longest one to the shortest one. A strictly sequential version of the construction of the suffix tree was described by Ukkonen [14]. When the alphabet is potentially infinite, the construction algorithms of the suffix tree can be implemented to run in time O(n log σ) and then are optimal since they imply an ordering on the letters of the alphabet. On particular integer alphabets, Farach [5] showed that the construction can be done in linear time. The minimization of the suffix trie gives the suffix automaton. The suffix automaton of a string is also known under the name of DAWG, for Directed Acyclic Word Graph. Its linearity was discovered by Blumer et al. (see [1]), who gave a linear construction (on a fixed alphabet). The minimality of the structure as an automaton is from Crochemore [2] who showed how to build with the same complexity the factor automaton of a text The compaction of the suffix automaton gives the compact suffix automaton (see [1]), A direct construction algorithm of the compact suffix automaton was …
منابع مشابه
Compact Suffix Trees Resemble PATRICIA Tries: Limiting Distribution of the Depth
Suffix trees are the most frequently used data structures in algorithms on words. In this paper, we consider the depth of a compact suffix tree, also known as the PAT tree, under some simple probabilistic assumptions. For a biased memoryless source, we prove that the limiting distribution for the depth in a PAT tree is the same as the limiting distribution for the depth in a PATRICIA trie, even...
متن کاملThe Virtual Suffix Tree: An Efficient Data Structure for Suffix Trees and Suffix Arrays
We introduce the VST (virtual suffix tree), an efficient data structure for suffix trees and suffix arrays. Starting from the suffix array, we construct the suffix tree, from which we derive the virtual suffix tree. The VST provides the same functionality as the suffix tree, including suffix links, but at a much smaller space requirement. It has the same linear time construction even for large ...
متن کاملA Dynamic Approach to Weighted Suffix Tree Construction Algorithm
In present time weighted suffix tree is consider as a one of the most important existing data structure used for analyzing molecular weighted sequence. Although a static partitioning based parallel algorithm existed for the construction of weighted suffix tree, but for very long weighted DNA sequences it takes significant amount of time. However, in our implementation of dynamic partition based...
متن کاملSuffix Tree of Alignment: An Efficient Index for Similar Data
We consider an index data structure for similar strings. The generalized suffix tree can be a solution for this. The generalized suffix tree of two strings A and B is a compacted trie representing all suffixes in A and B. It has |A|+ |B| leaves and can be constructed in O(|A|+ |B|) time. However, if the two strings are similar, the generalized suffix tree is not efficient because it does not ex...
متن کاملStudy of Data Localities in Suffix-Tree Based Genetic Algorithms
This paper focuses on the study of cache localities of two genetic algorithms based on the Suffix Tree structure. As well as a description of the cache performance of the Suffix Tree.
متن کامل2 Compact Suffix Arrays
The suffix array data structure that we present is due to Grossi and Vitter [1]. It uses a recursive construction that inflates the alphabet size, much like the the suffix array construction that we saw in Lecture 18. Building on this, we will construct a low-space suffix tree by augmenting this suffix array with an additional tree structure. This construction is due to Munro, Raman and Rao [2]...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009